Predicting local malaria exposure using a Lasso-based two-level cross validation algorithm
نویسندگان
چکیده
Recent studies have highlighted the importance of local environmental factors to determine the fine-scale heterogeneity of malaria transmission and exposure to the vector. In this work, we compare a classical GLM model with backward selection with different versions of an automatic LASSO-based algorithm with 2-level cross-validation aiming to build a predictive model of the space and time dependent individual exposure to the malaria vector, using entomological and environmental data from a cohort study in Benin. Although the GLM can outperform the LASSO model with appropriate engineering, the best model in terms of predictive power was found to be the LASSO-based model. Our approach can be adapted to different topics and may therefore be helpful to address prediction issues in other health sciences domains.
منابع مشابه
Regression Trees and Random forest based feature selection for malaria risk exposure prediction
This paper deals with prediction of anopheles number, the main vector of malaria risk, using environmental and climate variables. The variables selection is based on an automatic machine learning method using regression trees, and random forests combined with stratified two levels cross validation. The minimum threshold of variables importance is accessed using the quadratic distance of variabl...
متن کاملCoordinate Descent Algorithms for Lasso Penalized Regression
Imposition of a lasso penalty shrinks parameter estimates toward zero and performs continuous model selection. Lasso penalized regression is capable of handling linear regression problems where the number of predictors far exceeds the number of cases. This paper tests two exceptionally fast algorithms for estimating regression coefficients with a lasso penalty. The previously known ℓ2 algorithm...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملDevelopment and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles
There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high d...
متن کاملComparison of Genomic Selection Models to Predict Flowering Time and Spike Grain Number in Two Hexaploid Wheat Doubled Haploid Populations
Genomic selection (GS) is becoming an important selection tool in crop breeding. In this study, we compared the ability of different GS models to predict time to young microspore (TYM), a flowering time-related trait, spike grain number under control conditions (SGNC) and spike grain number under osmotic stress conditions (SGNO) in two wheat biparental doubled haploid populations with unrelated...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 12 شماره
صفحات -
تاریخ انتشار 2017